Vector Road Map Registration to Oblique Wide Area Motion Imagery by Exploiting Vehicle Movements
نویسندگان
چکیده
We present a novel methodology for accurately registering a vector road map to wide area motion imagery (WAMI) gathered from an oblique perspective by exploiting the local motion associated with vehicular movements. Specifically, we identify and compensate for global motion from frame-to-frame in the WAMI which then allows ready detection of local motion that corresponds strongly with the locations of moving vehicles along the roads. Minimization of the chamfer distance between these identified locations and the network of road lines identified in the vector road map provides an accurate alignment between the vector road map and the WAMI image frame under consideration. The methodology provides a significant improvement over the approximate geo-tagging provided by on-board sensors and effectively side-steps the challenge of matching features between the completely different data modalities and viewpoints for the vector road map and the captured WAMI frames. Results over a test WAMI dataset indicate the effectiveness of the proposed methodology: both visual comparison and numerical metrics for the alignment accuracy are significantly better for the proposed method as compared with existing alternatives. Introduction Recent technological advances have made available number of airborne platforms for capturing imagery [1,2]. One of the specific areas of emerging interest for applications isWide AreaMotion Imagery (WAMI) where images at temporal rates of 1–2 frames per-second can be captured for relatively large areas that span substantial parts of a city while maintaining adequate spatial detail to resolve individual vehicles [3]. WAMI platforms are becoming increasingly prevalent and the imagery they generate are also feeding a corresponding thrust in large scale visual data analytics. The effectiveness of such analytics can be enhanced by combining the WAMI with alternative sources of rich geo-spatial information such as road maps. In this paper we focus on near real-time registration of vector road-map data to WAMI and propose a novel methodology that exploits vehicular motion for accurate and computationally efficient alignment. Registering road map vector data with aerial imagery leads to rich source of geo-spatial information, which can be used for many applications. One application of interest is moving vehicle detection and tracking in wide area motion imagery (WAMI). By registering the road network to aerial imagery, we can easily filter out the false detections that occurred off roads. Another interesting application is to detect and track a suspicious vehicle that goes off road. These applications depend on accurate road network alignment with the aerial imagery, which is the focus of this paper. In general, successive WAMI video frames are related by both global and local motions. The global motion arises from the camera movement due to the aerial platform movement, and it can be parameterized as a homography between the spatial coordinates for successive frames under the assumption that the captured scene is planar. The local motion arises due to the local movement of objects in the scene. Local motion in WAMI for urban scenes is dominated by vehicle movements on the network of roads within the captured area. We exploit these vehicular movements to develop an effective registration scheme for aligning vector road maps data with the captured WAMI frame. (b) (a) Figure 1: Road network alignment. (a) using only aerial frame metadata, (b) using our proposed algorithm. WAMI frames are usually captured from platform equipped with Global Positioning System (GPS) and Inertial Navigation System (INS) which provide location and orientation information that are usually stored with the aerial image as meta-data. This meta-data can be used to align a road network extracted from external Geographic Information System (GIS) source. However, as illustrated by the example in Fig. 1(a), the accuracy of the meta-data is limited and only provides an approximate alignment. Registering an aerial image directly with a geo-referenced vector road map is a challenging task because of the differences in the nature of the data in the two formats: in one case the data consists of image pixel values whereas in the other it is described as lines/curves connecting a series of points. Because of the inherent differences in the data formats, one cannot readily define low/mid-level features that are invariant to the representations and can be used for registration as conventional feature detectors, such as SIFT (Scale-Invariant Feature Transform) [4], are used for finding corresponding points in images. For static imagery, a lot of research has been done for aligning vector road maps to aerial imagery, normally referred to as the process of conflating. In general, conflation refers to a process that fuses spatial representation from multiple data sources to obtain a new superior representation. In [5–7], road vector data are aligned with an aerial image by matching the road intersection points in both representations. The crucial element in these prior works is the detection of road intersections from the aerial image. With the availability of hyper-spectral aerial imagery, spectral properties and contextual analysis are used in [5] to detect these road intersections in the aerial scene. However, road segmentation is not robust for different natural scenes specially when roads are obscured by shadows from trees and nearby buildings. In [6], a Bayes classifier used to classify pixels as on-road or off-road, then a localized template matching used to detect the road intersections. However, to get a reasonable accuracy with the Bayes classifier, a large number of manually labeled training pixels is required for each data set. In [7], corner detection is used to detect the road intersections, which is not reliable specially in high resolution aerial images, that contain enough wide roads where the simple corner detection fails. Work on registration of (non-static) WAMI frames to geo©2016 Society for Imaging Science and Technology DOI: 10.2352/ISSN.2470-1173.2016.3.VSTIA-520 IS&T International Symposium on Electronic Imaging 2016 Video Surveillance and Transportation Imaging Applications 2016 VSTIA-520.1 referenced vector road maps has received comparatively less attention, even though the capability for performing such registration in a computationally efficient manner is crucial for a number real/near real-time analysis applications for WAMI, as already mentioned. Some of the prior work on this problem overcomes the problem posed by fundamentally different modalities of the WAMI and vector datasets by using an auxiliary geo-referenced image that is already aligned with the vector road map. The aerial image frames are then aligned to the auxiliary geo-referenced image by using conventional image feature matching methods. For example, in [8], for the purpose of vehicular tracking, the aerial frame is geo-registered with a geo-reference image and then a GIS database is used for road network extraction. This road network is used to regularize the matching of the current vehicle detections to the previous existing vehicular tracks. In an alternative approach that relies on 3D geometry, in [9], SIFT is used to detect correspondences between the ground features from a small footprint aerial video frame and geo-referenced image. This geo-registration helps to estimate the camera pose and depth map for each frame, and this depth map is used to segment the scene into building, foliage, and roads using a multi-cue segmentation framework. The process is computationally intensive and the use of the auxiliary georeferenced image is still plagued by problems with identification of corresponding feature points because of the illumination changes, different capturing times, severe view point change in aerial imagery, and occlusion. State of the art feature point detectors and descriptors such as SIFT (Scale-Invariant Feature Transform) [4], and SURF (Speeded Up Robust Features) [10], often find many spurious matches that cause robust estimators such as RANSAC [11] to fail when estimating a homography. Also, these methods cannot work directly if the aerial video frames have a different modality (infra-red for example) than the geo-referenced image. Last, but not least, a single homography represents the relation between two images when the scene is close to planar [12]. In WAMI, aerial video frames usually taken from oblique camera array to cover large ground area from moderate height and the scene usually contains non ground objects such as building, trees, and foliage. Thus the planar assumption does not necessarily hold across the entire imagery, although it is not unreasonable for the road network. In this paper, we propose an algorithm that accurately aligns a vector road network to WAMI aerial video frames by detecting the locations of moving vehicles and aligning the detected vehicle locations with the network of roads in the vector road map. The vehicle locations are readily detected by performing frame-to-frame registration using conventional image feature matching methods and computing compensated frame differences to identify local motion that differs significantly from the overall global motion resulting from the camera movement. Such local motion is predominantly due to moving vehicles and the regions where the compensated frame differences are large correspond (predominantly) to vehicle locations. We align the WAMI frames to the vector road map by estimating the projective transformation parameters that, after appropriate application of the transformation, minimize a metric defined as the sum of minimum squared distances from the detected vehicle locations to the corresponding nearest points on the network of roads. This metric is the well known chamfer distance, which can be efficiently computed via the distance transform [13]. The chamfer distance serves as an ideal quantitative metric for the degree of misalignment because it does not require any feature correspondences or computation of displaced frame differences, both of which are inappropriate for our problem setting because of the different modalities of the data. By exploiting vehicle detections and using the vector road network, we implicitly transfer both the aerial image and the geo-referenced one to a representation that can be easily matched. In other words, unlike traditional methods, our algorithm does not directly estimate any feature correspondence between the WAMI image frames and the vector road maps. Instead, it aligns two binary images representing the vehicle detections and the network of road lines identified in the vector map, thereby providing a more accurate and robust alignment. A sample result from our algorithm is shown in Fig. 1(b), where it can be appreciated that the method provides an accurate alignment to the road network. Our main assumption here is that the investigated scene should contain a forked road network which is reasonable assumption for WAMI, which covers a city scale ground area within each frame. Our algorithm does not depend on the aerial camera sensor type; for example, it can be used directly with infra-red aerial camera. This paper is organized as follows. The next section explains our proposed algorithm. Results and a comparison against alternative methods are presented in the following section. The final section summarizes concluding remarks. Proposed algorithm for vehicle motion based WAMI alignment A high level overview of the proposed algorithm is shown in blockdiagram format in Fig. 2 using illustrative example images. Our algorithm consists of three major parts. First, we do frame to frame registration to align temporally adjacent WAMI frame, denoted by It , and It+1, into the common reference frame for It and compute the displaced frameto-frame difference [14] between them. The regions of significant magnitude in these frame-to-frame differences correspond predominantly to the locations of moving vehicles. Then we use the meta-data associated with It along with the vector road network to generate a road network coarsely aligned with It . Finally, we estimate the final alignment between the aligned road network and the vehicle detections by minimizing the chamfer distance [13] between them which corresponds to minimizing the sum of the squared distances between each vehicle detection and corresponding nearest point on the road network. The chamfer distance measures how close the vehicle detections are to the road network and therefore applies nicely to our problem. registration Vector road map Flight meta−data
منابع مشابه
Ve tor Road Map Registration to Oblique Wide Area Motion Imagery by Exploiting Vehi le Movements
We present a novel methodology for a urately registering a ve tor road map to wide area motion imagery (WAMI) gathered from an oblique perspe tive by exploiting the lo al motion asso iated with vehi ular movements. Spe i ally, we identify and ompensate for global motion from frame-to-frame in the WAMI whi h then allows ready dete tion of lo al motion that orresponds strongly with the lo ations ...
متن کاملSupplementary Material for “Automatic Registration of Vector Road Maps with Wide Area Motion Imagery by Exploiting Vehicle Detections”
This document provides supplementary material, for the paper [2]. In Section S.I, we provide geographical coordinates for the WAMI datasets used in our evaluation. In Section S.II, we summarize file format and other information for the datasets. In Section S.III, we describe how we align successive WAMI frames efficiently and detect tentative vehicle locations in a WAMI frame. In Section S.IV, ...
متن کاملVehicle Detection and Tracking in Wide Area Motion Imagery by Co-Registering and Exploiting Vector Roadmaps
Wide Area Motion Imagery (WAMI) is an emerging technology that allow images to be captured for relatively large areas that span substantial parts of a city with high spatial detail typically at low temporal rates. WAMI captured imagery is feeding a corresponding thrust in large scale visual data analytics such as large scale vehicle tracking. The effectiveness of such analytics can be enhanced ...
متن کاملObject Detection in Wide Area Aerial Surveillance Imagery with Deep Convolutional Networks
Wide Area Aerial Surveilllance (WAAS) technology, alternatively called Wide Area Motion Imagery (WAMI), and Wide Area Persistent Surveillance (WAPS), is capable of monitoring many square kilometers of land for extended periods of time, particularly if used in conjunction with Unmanned Aerial Vehicle (UAV) platforms. It has applications in security, traffic monitoring, and emergency management, ...
متن کاملPersistent Electro-Optical/Infrared Wide-Area Sensor Exploitation
In this paper, we discuss algorithmic approaches for exploiting wide-area persistent EO/IR motion imagery for multisensor geo-registration and automated information extraction, including moving target detection. We first present enabling capabilities, including sensor auto-calibration and automated high-resolution 3D reconstruction using passive 2D motion imagery. We then present algorithmic ap...
متن کامل